Tackling Biased Baselines in the Risk-Sensitive Evaluation of Retrieval Systems

نویسندگان

Bekir Taner Dinçer

Iadh Ounis

Craig MacDonald

چکیده

The aim of optimising information retrieval (IR) systems using a risksensitive evaluation methodology is to minimise the risk of performing any particular topic less effectively than a given baseline system. Baseline systems in this context determine the reference effectiveness for topics, relative to which the effectiveness of a given IR system in minimising the risk will be measured. However, the comparative risk-sensitive evaluation of a set of diverse IR systems – as attempted by the TREC 2013 Web track – is challenging, as the different systems under evaluation may be based upon a variety of different (base) retrieval models, such as learning to rank or language models. Hence, a question arises about how to properly measure the risk exhibited by each system. In this paper, we argue that no model of information retrieval alone is representative enough in this respect to be a true reference for the models available in the current state-of-the-art, and demonstrate, using the TREC 2012 Web track data, that as the baseline system changes, the resulting risk-based ranking of the systems changes significantly. Instead of using a particular system’s effectiveness as the reference effectiveness for topics, we propose several remedies including the use of mean within-topic system effectiveness as a baseline, which is shown to enable unbiased measurements of the risk-sensitive effectiveness of IR systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tackling uncertainty in safety risk analysis in process systems: The case of gas pressure reduction stations

Industrial plants are subjected to very dangerous events. Therefore, it is very essential to carry out an efficient risk and safety analysis. In classical applications, risk analysis treats event probabilities as certain data, while there is much penurious knowledge and uncertainty in generic failure data that will lead to biased and inconsistent alternative estimates. Then, in order to achieve...

متن کامل

Performance Evaluation of Medical Image Retrieval Systems Based on a Systematic Review of the Current Literature

Background and Aim: Image, as a kind of information vehicle which can convey a large volume of information, is important especially in medicine field. Existence of different attributes of image features and various search algorithms in medical image retrieval systems and lack of an authority to evaluate the quality of retrieval systems, make a systematic review in medical image retrieval system...

متن کامل

Review of ranked-based and unranked-based metrics for determining the effectiveness of search engines

Purpose: Traditionally, there have many metrics for evaluating the search engine, nevertheless various researchers’ proposed new metrics in recent years. Aware of this new metrics is essential to conduct research on evaluation of the search engine field. So, the purpose of this study was to provide an analysis of important and new metrics for evaluating the search engines. Methodology: This is ...

متن کامل

NEW CRITERIA FOR RULE SELECTION IN FUZZY LEARNING CLASSIFIER SYSTEMS

Designing an effective criterion for selecting the best rule is a major problem in theprocess of implementing Fuzzy Learning Classifier (FLC) systems. Conventionally confidenceand support or combined measures of these are used as criteria for fuzzy rule evaluation. In thispaper new entities namely precision and recall from the field of Information Retrieval (IR)systems is adapted as alternative...

متن کامل

Using Predicate-Argument Structures for Context-Dependent Opinion Retrieval

Current opinion retrieval techniques do not provide context-dependent relevant results. They use frequency of opinion words in documents or at proximity to query words, such that opinionated documents containing the words are retrieved regardless of their contextual or semantic relevance to the query topic. Thus, opinion retrieved for the qualitative analysis of products, performance measuremen...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Tackling Biased Baselines in the Risk-Sensitive Evaluation of Retrieval Systems

نویسندگان

چکیده

منابع مشابه

Tackling uncertainty in safety risk analysis in process systems: The case of gas pressure reduction stations

Performance Evaluation of Medical Image Retrieval Systems Based on a Systematic Review of the Current Literature

Review of ranked-based and unranked-based metrics for determining the effectiveness of search engines

NEW CRITERIA FOR RULE SELECTION IN FUZZY LEARNING CLASSIFIER SYSTEMS

Using Predicate-Argument Structures for Context-Dependent Opinion Retrieval

عنوان ژورنال:

اشتراک گذاری